Optimal network topologies for local search with congestion 
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The problem of searchability in decentralized complex networks is of great importance in computer science, 
economy and sociology. We present a formalism that is able to cope simultaneously with the problem of search 
and the congestion effects that arise when parallel searches are performed, and obtain expressions for the average 
search cost — written in terms of the search algorithm and the topological properties of the network — both 
in presence and absence of congestion. This formalism is used to obtain optimal network structures for a 
system using a local search algorithm. It is found that only two classes of networks can be optimal: star-like 
configurations, when the number of parallel searches is small, and homogeneous-isotropic configurations, when 
the number of parallel searches is large. 
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Recently, the study of topological and dynamical properties 
of complex networks has received a lot of interest [Jj], ||, |^]. 
Part of this interest comes from the attempt to understand the 
topology and behavior of computer based communication net- 
works such as the Internet [Qj and the World Wide Web 
However, the study of communication processes in a wider 
sense is of interest in other fields, remarkably the design of 
organizations [Q, ^, 

One of the general principles that has been discovered in 
many such complex networks is the short average distance 
between nodes More surprisingly, it has been shown that 
these short paths can be found with essentially local strate- 
gies, i.e. with strategies that do not require precise global in- 
formation of the network. Indeed, for social networks, this 
fact was experimentally confirmed a long time ago by the fa- 
mous experiment of Travers and Milgram [nOI] and theoretical 
explanations have been given by Kleinberg Jl 1|] and, more re- 
cently, by Watts et. al. [Q. These explanations are based 
on the plausible assumption that there is a structure (social, 
geographical, etc.) that underlies the complex social network 
and provides information that can be exploited heuristically in 
a search process. In scale-free communication networks and 
in some decentralized peer-to-peer communication networks 
such as Gnutella or Freenet, it has been shown [ |l3[ |l4| | that 
the skewness of the degree distribution and the existence of 
highly connected hubs allows the design of algorithms that 
search quite efficiently even when the size of the system is 
large. 

Our approach in the present work is complementary to these 
efforts. The question we pose is the following: given a search 
algorithm that uses purely local information — i.e. knowl- 
edge of the first neighbors in the network — and a fixed set of 
resources — i.e. a fixed number of nodes and links — , which is 
the topology that optimizes the search process? We consider 
a general situation where the network has to tackle several si- 



multaneous (or parallel) search problems, which in turn rises 
the important issue of congestion [OHI 13, [jJJ at overbur- 
dened nodes. Indeed, for a single search problem the optimal 
network is clearly a highly polarized star-like structure. This 
structure is cheap to assemble in terms of number of links and 
efficient in terms of searchability, since the average cost (num- 
ber of steps) to find a given node is always bounded (2 steps), 
independently of the size of the system. However, the po- 
larized star-like structure will become inefficient when many 
search processes coexist in parallel in the network, due to the 
limitation of the central node. 

The discovery of optimal structures will be a useful guide 
to design, redesign and drive the evolution of communication 
networks such as peer-to-peer networks, distributed databases, 
and organizations. 

In this paper we present a formalism that is able to 
cope with search and congestion simultaneously, allowing 
the determination of optimal topologies. This formalism 
avoids the problem of simulating the dynamics of the search- 
communication process which turns out to be impracticable, 
specially close to the congestion point where search costs 
(time) diverge. We do not focus on detailed models of any of 
the above mentioned communication networks (organizations, 
computer networks, etc). Rather, we study a general scenario 
applicable to any communication process. First we calculate 
the average number of steps (search cost) needed to find a cer- 
tain node in the network given the search algorithm and the 
topology of the network. The calculation is exact if the search 
algorithm is Markovian. Next, congestion is introduced as- 
suming that the network is formed by nodes that behave like 
queues, meaning that are able to deliver a finite number of 
packets at each time step Jl6| , [l7 , 19]. In this context, we are 
able (i) to calculate explicitly the point at which the arrival rate 
of packets leads to network collapse, in the sense that the aver- 
age time needed to perform a search becomes unbounded, and 
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(ii) to determine, below the point of collapse, how the average 
search time depends on the rate at which search process are 
started. In both cases, the relevant quantities are expressed in 
terms of the topology of the network and the search algorithm. 
Finally we obtain optimal structures by performing exhaustive 
generalized simulated annealing [[20| |T]| in the space of the 
networks with fixed size and connectivity. We find that when 
the number of parallel searches is small, the star-like config- 
uration turns out to be optimal as expected, while for a large 
number of parallel searches, a very decentralized and uniform 
network is best. Surprisingly, no other structures apart from 
these extremely-centralized and extremely-decentralized net- 
works are found to be optimal. 

First, we consider the average cost to find a given node in 
an arbitrary communication network when there is no conges- 
tion. Specifically, we focus on a single information packet 
at node i whose destination is node k, i.e. a packet searching 
for k. The probability for the packet to go from i to a new 
node j in its next movement is . In particular, pjL = Vj 
so that the packet is removed as soon as it arrives to its des- 
tination. The precise form of p^ will depend on the search 
algorithm. In particular, when the search is Markovian, p^ 
does not depend on previous positions of the packet. In this 
case, the probability of going from i to j in n steps is given by 
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Thus defining the matrices p fc and P fc (n) we have 

P fc (n) = (pT- ( 2 ) 
We next define the effective distance matrices 



d fc = Y j nP k {n) = ^2 n (P K 
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(3) 

whose elements eft are the average number of steps needed to 

L 

go from i to j for a packet traveling towards k [25[|. In par- 
ticular, the element d\ k is the average number of steps needed 
to find k starting from i. When the search algorithm is such 
that the packets follow minimum paths between nodes, the ef- 
fective distance will coincide with the topological minimum 
distance; otherwise, the effective distance between nodes will 
be, in general, larger than the topological minimum distance. 
Finally, the average search cost in the network when there is 
not congestion is 
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where S is the number of nodes in the network. 

Consider next which is the centrality of each of the nodes 
in the communication network. First, we calculate the aver- 
age number of times, 6^ , that a packet generated at i and with 



destination k passes through j. According to the previous def- 
initions 

oo oo 

b k = J2 P k (n) = J2 (pT = (! - PT V- (5) 

n— 1 n—1 

The effective betweenness of node j, Bj, is defined as 



(6) 



Again, as in the case of the effective distance, when the search 
algorithm is able to find the minimum paths between nodes, 
the effective betweenness will coincide with the topological 
betweenness, j3j, as usually defined J2^ , [23| ]. The effective 
betweenness of the nodes in a network contains valuable in- 
formation about its behavior when multiple searches are per- 
formed simultaneously and congestion considerations become 
relevant. 

Consider the following general scenario. In the communi- 
cation network, each node generates packets at a rate p per 
unit of time, independently of the rest of the nodes. The des- 
tination of each of these packets is randomly fixed at the mo- 
ment of its creation. On the other hand, the nodes are queues 
that can store as many packets as needed but can deliver, on 
average, only a finite number of them at each time step — 
without lost of generality, we fix this number to 1 . It is known 
p^ , \l% |l8| ] that for low values of p the system reaches a 
steady state in which the total number of floating packets in 
the network N(t) fluctuates around a finite value. As p in- 
creases, the system undergoes a continuous phase transition 
to a congested phase in which N(t) oc t [p"oj]. Right at the 
critical point, p c , quantities such as N(t) and the characteris- 
tic time diverge [g4J]. Below p c , there is no accumulation at 
any node in the network and the number of packets that arrive 
to node j is, on average, pBj/ (5—1). Therefore, a particular 
node will collapse when pBj/(S — 1) > 1 and the critical 
congestion point of the network will be 
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where B* is the maximum effective betweenness in the net- 
work, that corresponds to the most central node. 

To calculate the average of the load of the network, (N(t)), 
it is necessary to establish the behavior of the queues. In 
the general scenario proposed above, the arrival of packets 
to a given node j is a Poisson process with mean pj — 
pBj/(S — 1). Regarding the delivery of packets, assume the 
simplest case in which it is also a Poisson process and hence 
the time between two consecutive packet deliveries follows an 
exponential distribution [5^]. In general, when the arrival and 
delivery processes are Poisson, the average size of the queues 
is given by |]jT 
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The average load of the network (N(t)) is 

S S pBj 

<*(*)> = £<«*> = £7^ 

i=i j=i 1 s-i 



(9) 



There are two interesting limiting cases of this expression. 
When p is very small, (vj) « p,j and taking into account that 

Ei b j = E*, fc one obtains 



(N(t)) ^ pSd p^ 0. 



(10) 



On the other hand, when p approaches p c most of the load 
of the network comes from the most congested node, and 
therefore [|28| 
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It is worth noting that there are only two assumptions in the 
calculations above. The first one has already been mentioned: 
the movement of the packets needs to be Markovian to define 
the jump probability matrices p . Although this is not strictly 
true in real communication networks — where packets are not 
allowed usually to go through a given node more than once — 
it can be seen as a first approximation Jlq , |l7[ [lq ] . The second 
assumption is that the jump probabilities py do not depend on 
the congestion state of the network, although communication 
protocols sometimes try to avoid congested regions, and then 
Bj = Bj{p) [p9[|. Our calculations, in particular equations 
correspond to the worst case scenario and thus pro- 
vide bounds to more realistic scenarios in which the search 
algorithm interactively avoids congestion. 

Equations (^), (||) and (^) enable us to tackle the problem 
of finding optimal structures for local search. Optimality is 
defined as minimization the average time needed to perform 
a search. Indeed, according to Little's Law [jig], the aver- 
age time needed by a packet to reach its destination is propor- 
tional to the total load of the network, and therefore minimiz- 
ing (N(t)) is equivalent to minimizing the average cost of a 
search. In a local search scenario, the p k matrices are given 
by 



Pi 



aikSjk + (1 - a ik - S ik ) 



E; Oi 



(12) 



where ay are the elements of the adjacency matrix of the net- 
work. The first term corresponds to i and k being neighbors: 
then the packet will go to j if and only if j = k, i.e. the packet 
will be sent directly to the destination. The second term corre- 
sponds to i and k not being neighbors: in this case, j is chosen 
at random and uniformly among the neighbors of i. Finally, 
the delta symbol ensures that pjjv . = Vj and the packet dis- 
appears from the network. 

The optimization process is carried out using generalized 
simulated annealing as described in [p0|, pT|]. Starting from a 



given initial network configuration, random rewiring of indi- 
vidual links are performed, the cost (N(t)) is evaluated ac- 
cording to (^|) and the change is accepted with a certain prob- 
ability that depends on a computational temperature, which is 
decreased so that the system tends to explore regions of the 
configuration state with lower and lower costs. Regarding the 
cooling, at a given temperature, each node of the network is 
allowed to try a rewiring. Then the temperature is decreased 
by 1%, and the process is repeated until a minimum tempera- 
ture is reached or, alternatively, the system has remained un- 
changed after a significantly large amount of rewiring trials. 
Different sets of initial conditions are explored: for a given 
value of p, the optimization process is started from random 
initial configurations and also from networks that turned out 
to be optimal at similar values of p. Of all the realizations, 
only the network with a smallest cost is considered as opti- 
mal. 

The results of the optimization process are shown in Fig. 
|]. For p — > 0, the optimal network has a star-like centralized 
structure as expected, which corresponds to the minimization 
of the average effective distance between nodes (Eq. |o|). On 
the other extreme, for high values of p, the optimal structure 
has to minimize the maximum betweenness of the network, 
according to Eq. (|TT|). This is accomplished by creating a ho- 
mogeneous network where all the nodes have essentially the 
same degree, betweenness, etc. To characterize the networks 
at all values of p, we introduce a measure of the polarization, 
7r, of the network: 



(13) 



where (3 is, as before, the topological betweenness of the 
nodes. For star-like networks, the value of it is large while 
for very homogeneous networks it rs 0. Although one could 
expect that optimal networks cover the whole range of val- 
ues from 7r = ir s tar to _ « 0, the results of the optimization 
process reveal a completely different scenario. According to 
simulations, star-like configurations are optimal for p < p*; 
at this point, the homogeneous networks that minimize B* 
become optimal. Therefore there are only two type of struc- 
tures that can be optimal for a local search process: star-like 
networks for p < p* and homogeneous networks for p > p* . 

In summary, we have found analytical expressions for the 
relationship between topological properties of networks and 
the specific dynamic behavior when faced with local search 
with congestion. These expressions allow the calculation of 
the search cost in terms of the effective betweenness of the 
nodes, which is calculated via the transition probability ma- 
trices (formal expressions of the search algorithm). This for- 
malism allows to perform an exhaustive search for optimal 
topologies in terms of parallel searchability avoiding the sim- 
ulation of the dynamics of the parallel search process, that is 
prohibitive in computational time. Moreover, the formalism 
is general enough to deal with other search scenarios — local 
searches with knowledge up to second nearest neighbors, third 
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FIG. 1: Optimal structures for local search with conges- 
tion. Top: Star-like configuration optimal for p < p* (left), 
and homogeneous-isotropic configuration optimal for p > p* 
(right). Bottom: Polarization of the optimal structure as a 
function of p, for networks of size S = 32 and different number 
of links L. 



nearest neighbors and so on (eventually, global knowledge) — 
simply redefining the p& elements. We find that the opti- 
mal network topologies for local search considering conges- 
tion are split in two categories: a star-like network topology, 
that is optimal for small number of parallel searches, and the 
homogeneous-isotropic network topology, that is optimal for 
large numbers of parallel searches. Strikingly, the transition 
between these categories is sharp, i.e. we are not able to 
find any optimal network topology different from these two 
classes. 
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